In-place Update of Suffix Array while Recoding Words
نویسندگان
چکیده
Motivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array after the substitution, in the indexed text, of some occurrences of a given word by a new character. Compared to other published index update methods, the problem addressed here may require the modification of a large number of distinct positions over the original text. The proposed algorithm uses the specific internal order of suffix arrays in order to update simultaneously groups of entries, and ensures that only entries to be modified are visited. Experiments confirm a significant execution time speed-up compared to the construction of suffix array from scratch at each step of the application.
منابع مشابه
In-Place Longest Common Extensions
Longest Common Extension (LCE) queries are a fundamental sub-routine in many stringprocessing algorithms, including (but not limited to) suffix-sorting, string matching, and identification of palindrome factors and repeats. A LCE query takes as input two positions i, j in a text T ∈ Σ and returns the length l of the longest common prefix between T ’s i-th and j-th suffixes. It is clear that we ...
متن کاملOptimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets
Suffix arrays and LCP arrays are one of the most fundamental data structures widely used for various kinds of string processing. Many problems can be solved efficiently by using suffix arrays, or a pair of suffix arrays and LCP arrays. In this paper, we consider two problems for a string of length N , the characters of which are represented as integers in [1, . . . , σ] for 1 ≤ σ ≤ N ; the stri...
متن کاملOptimal Substring-Equality Queries with Applications to Sparse Text Indexing
We consider the problem of encoding a string of length n from an alphabet [0, σ − 1] so that access and substring-equality queries (that is, determining the equality of any two substrings) can be answered efficiently. A clear lower bound on the size of any prefix-free encoding of this kind is n log σ + Θ(log(nσ)) bits. We describe a new encoding matching this lower bound when σ ≤ nO(1) while su...
متن کاملOptimal In-Place Suffix Sorting
The suffix array is a fundamental data structure for many applications that involve string searching and data compression. Designing time/space-efficient suffix array construction algorithms has attracted significant attentions and considerable advances have been made in the last 20 years. We obtain the suffix array construction algorithms that are optimal both in time and space for both intege...
متن کاملLow Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008